class: center, middle, inverse, title-slide .title[ # Statistical machine learning methods for neuroscientific data:
Analysing brain activity, psychopathology and
cognitive function in adolescents
] .subtitle[ ## Final Seminar
Owen Forbes
May 5 2023
] .author[ ### ] .date[ ### Distinguished Professor Kerrie Mengersen (Principal Supervisor)
Dr Paul Wu (Associate Supervisor)
Dr Edgar Santos Fernandez (Associate Supervisor)
Queensland University of Technology ] --- class: list-space <style> .list-space li { padding: 0.25cm; } .list-nobullet li { list-style-type:none; } .center2 { margin: 0; position: absolute; top: 30%; left: 40%; -ms-transform: translate(-35%, -35%); transform: translate(-35%, -35%); } </style> #### Mental Health in Adolescence --- #### Mental Health in Adolescence .content-box-blue[ - Mental health issues are a significant and growing problem in youth populations (Sawyer et al., 2018; McGorry & Mei, 2018)] -- .content-box-red[ - Many mental health problems & disorders are thought to emerge in adolescence; period of rapid brain development and change (Paus et al., 2008; McGorry et al., 2011)] -- .content-box-green[ - Mental health risk in childhood & adolescence is understood to be **pluripotential** - Non-specific risk overlaps across a number of clinical diagnostic categories (Raballo & Poletti, 2020; Hartmann et al., 2019)] -- 👉 We need **improved data-driven statistical modeling** to better understand the <br> **complex relationships between neurophysiology and mental health in adolescence** 🧠 --- ### Case-control and Data-driven Approaches .pull-left[.content-box-blue[ **Case-control studies** {{content}} ]] -- - Identifies differences between diagnosed and undiagnosed groups {{content}} -- - Limited to find information within diagnostic categories based on clinical judgement {{content}} -- - Limited generalisability / replicability outside of sample (Latzman & DeYoung, 2020) -- .pull-right[.content-box-red[ **Data-driven empirical methods** {{content}} ]] -- - Identifies subgroups of symptoms or individuals based on patterns in data, rather than diagnostic categories / clinician judgement {{content}} -- - Prioritises population-based cohort studies (Latzman & DeYoung, 2020) {{content}} -- - Growing emphasis on data-driven electroencephalography (EEG) research to find new discoveries out of scope from traditional diagnostic categories (Loo et al., 2016; Keizer, 2019) --- background-image: url("data:image/png;base64,#imgs/eeg_diag.png") background-position: top right background-size: 450px ### Electroencephalography (EEG) Data - Records electrical brain activity <br> using electrodes on the scalp - Signals represent cumulative activity <br> of rhythmic firing across networks <br> of many neurons <br> -- .pull-left[.content-box-blue[ **Strengths** {{content}} ]] -- - High temporal resolution for measuring oscillations & dynamic brain states (Burle et al., 2015) {{content}} -- - Low cost, widely available (Liedorp et al., 2009) {{content}} -- - Non-invasive, low participant burden -- .pull-right[.content-box-red[ **Challenges** {{content}} ]] -- - Low signal-to-noise ratio (Wu et al., 2015) - Eye and muscle movements - Background electrical noise - Electrode contact quality {{content}} -- - Non-stationary (Sun and Zhou, 2014) - Significant variation in underlying processes {{content}} -- - High dimensionality - Large number of channels & features --- #### Growing need for increased large scale research studies collecting data on brain, psychopathology and cognitive function -- .pull-left[.content-box-red[ **Longitudinal Adolescent Brain Study (LABS)** {{content}} ]] -- - Longitudinal cohort study of youth in the Sunshine Coast - Follows participants from age 12 - 17 - Recruitment goal: 500 participants {{content}} -- Thesis data scope: - 59 individuals aged 12 years (T1) - Resting state EEG, psychological distress, <br> sleep quality, cognitive function - Chapters 3 & 4 -- .pull-right[.content-box-green[ **Healthy Brain Network (HBN)** {{content}} ]] -- - Large cross-sectional study in New York region - Participants aged 5-21 - Recruited based on perceived clinical concern - Goal: Biobank of 10,000 participants {{content}} -- Thesis data scope: - 503 participants aged 9-15 - Pre-processed resting state EEG, <br> psychopathology and cognitive function - Chapters 5 & 6 --- class: middle, center # Aims & Objectives --- .pull-left[.content-box-red[ (1) Develop an end-to-end pipeline for EEG-based clusters in adolescents **(Ch 3)** ]] -- .pull-right[.content-box-red[ (1a) Using LABS data, examine associations with health measures. **(Ch 3 & 6)** ]] -- .pull-left[.content-box-yellow[ (2) Build Bayesian Model Averaging framework to combine clusters with probabilistic allocation & model-based uncertainty **(Ch 4)** ]] -- .pull-right[.content-box-yellow[ (2a) Assess clinical & applied utility of BMA features with case study using LABS EEG clusters **(Ch 4 & 6)** ]] -- .pull-left[.content-box-green[ (3) Develop nested functional analysis framework for time series data including functional characteristics in latent states **(Ch 5)** ]] -- .pull-right[.content-box-green[ (3a) Investigate associations between functional analysis outputs & health measures using HBN data **(Ch 5 & 6)** ]] -- .pull-left[.content-box-blue[ (4) Replicate and compare feature extraction and clustering pipelines (1) and (3) across LABS and HBN data **(Ch 6)** ]] -- .pull-right[.content-box-blue[ (4a) Explore cross-method, cross-sample replication of robust EEG phenotypes in adolescents using methods in (1) and (3) **(Ch 6)** ]] --- background-image: url("data:image/png;base64,#imgs/watercolour_crop_nologo.jpg") class: middle, center # Chapter 3 # EEG-based clusters differentiate psychological distress, sleep quality and cognitive function in adolescents --- ### Chapter 3: Novel Contributions -- - First work to investigate phenotypes using unsupervised clustering of EEG data from a population sample of adolescents -- - New insights about the presence of resting state EEG phenotypes, and clear associations with psychological distress, sleep quality and cognitive function -- - Novel statistical pipeline implemented in **freely available open source software** (R and MATLAB) for clinicians and researchers to apply in their own work -- - SAGE Award for Best HDR Student Publication (Faculty of Science) 2022 -- - Published in _Biological Psychology_ <img src="data:image/png;base64,#imgs/p1_biopsy_header.png" width="600" /> --- ### Chapter 3: Methods <img src="data:image/png;base64,#imgs/P1Fig1.png" width="800" /> <!-- #### Data Scope --> <!-- - Resting state EEG data, single electrode --> <!-- - 59 individuals in LABS, 12 years old (M = 12.64, SD = 0.32) --> --- ### Chapter 3: Methods .pull-left[ <img src="data:image/png;base64,#imgs/P1Fig2.png" width="250" /> ] .pull-right[ **Features** - Power in canonical bands -- Delta (0-4 Hz), ..., Beta (12-16 Hz) - Total Power - Individual Alpha Frequency - Spectral Edge Frequency - Spectral Entropy **Principal Component Analysis** - 3 PCs explained 80.6% of the variance **Clustering** - k-means - Hierarchical Clustering - Gaussian Mixture Model ] --- ### Chapter 3: Results .pull-left[<iframe src="imgs/plot_3d_km.html" width="500" height="250" scrolling="yes" seamless="seamless" frameBorder="0"> </iframe>] .pull-right[<iframe src="imgs/plot_3d_hc_colourfix.html" width="500" height="250" scrolling="yes" seamless="seamless" frameBorder="0"> </iframe>] .pull-left[<iframe src="imgs/plot_3d_gmm.html" width="500" height="250" scrolling="yes" seamless="seamless" frameBorder="0"> </iframe>] --- ### Core clusters across algorithms <iframe src="imgs/plot_3d_core_only.html" width="1000" height="500" scrolling="yes" seamless="seamless" frameBorder="0"> </iframe> --- ### Average frequency content across clusters <center> <img src="data:image/png;base64,#imgs/P1Fig5.png" width="450" /> </center> --- ### Health measures across clusters <center> <img src="data:image/png;base64,#imgs/P1Fig6.png" width="800" /> </center> --- ### Health measures across clusters - Bayesian regression models <center> <img src="data:image/png;base64,#imgs/P1Fig7.png" width="800" /> </center> --- #### Exciting preliminary results **Further questions of interest** -- <br> <br> - Can we develop a better way to combine multiple clustering algorithms, incorporating model-based uncertainty? **(Chapter 4)** -- <br> <br> - Can we use more sophisticated methods to characterise EEG, include temporal information, and avoid coarse canonical summary features? **(Chapter 5)** -- <br> <br> - How robust are the phenotypes associated with these clusters? Do we find similar results using different samples, and different methods? **(Chapter 6)** --- background-image: url("data:image/png;base64,#imgs/watercolour_crop_nologo.jpg") class: middle, center # Chapter 4 # _clusterBMA_: Bayesian Model Averaging for Clustering --- ### Chapter 4: Novel Contributions -- - First implementation of Bayesian model averaging to combine solutions across multiple clustering algorithms -- - Offers unique features for ensemble clustering including **quantification of model-based uncertainty** and probabilistic allocation to combined clusters -- - Substantially better accuracy in benchmarking performance compared to other ensemble algorithms, particularly for high-dimensional data with low separation between clusters -- - Implemented in open source & freely available **R package** for data scientists & practitioners to apply across various fields **github.com/of2/clusterBMA** -- - Preprint has been cited - Trending on DeepAI - Used by researchers at UWA and US government agencies - Invited talk to SSA QLD -- - Under review in _PLOS ONE_ --- ## Which clustering algorithm? .pull-left[<iframe src="imgs/plot_3d_km.html" width="500" height="250" scrolling="yes" seamless="seamless" frameBorder="0"> </iframe>] .pull-right[<iframe src="imgs/plot_3d_hc_colourfix.html" width="500" height="250" scrolling="yes" seamless="seamless" frameBorder="0"> </iframe>] .pull-left[<iframe src="imgs/plot_3d_gmm.html" width="500" height="250" scrolling="yes" seamless="seamless" frameBorder="0"> </iframe>] --- ## Inconsistent clustering across algorithms <iframe src="imgs/plot_3d_core_yellow.html" width="1000" height="500" scrolling="yes" seamless="seamless" frameBorder="0"> </iframe> --- ### Which clustering algorithm? - Different algorithms will emphasise different aspects of clustering structure -- - Choosing one 'best' model often arbitrary, unclear choice -- - ➡️ **Inference not calibrated for model-based uncertainty** -- - Locking into one method loses insights offered by other methods about plausible clustering structure -- .pull-left[ <img src="data:image/png;base64,#imgs/porque.gif" width="800" /> ] -- .pull-right[ **BMA offers a nice framework for combining clustering solutions** - simple - flexible - intuitive ] --- ### Combining Clustering Results with Bayesian Model Averaging -- - Limited development for Clustering - Finite mixture models (Russell et al., 2015) - Naive Bayes classifiers (Santafe & Lozano, 2006) - Lacks implementation across multiple clustering algorithms -- <br> **Advantages** - 🆒 Weighted averaging of results incorporating model quality / goodness of fit -- - 💯 Intuitive framework for **probabilistic** inferences combining results from **different clustering algorithms** -- - 😮 Each input solution can have a different number of clusters `\(K\)` -- - 😎 **Quantify model-based uncertainty and enable more robust inferences** calibrated accordingly --- ### Bayesian Model Averaging: Basics <br> <br> `$$P(\Delta | Y) = \sum_{l=1}^L (\Delta | Y, M_l)P(M_l|Y)$$` -- <br> <br> `$$P(M_l|Y) = \frac{P(Y|M_l)P(M_l)}{\sum_{l=1}^L P(Y|M_l)P(M_l)}$$` --- #### BMA for Mixture Models - BIC weighting - `\(P(Y|M_l)\)` typically involves a difficult/intractable integral and is often approximated for many applications (Fragoso et al., 2018) -- - **Russell et al. (2015)** weight results from multiple GMMs according to BIC `$$P(M_l \lvert Y) \approx \frac{exp(\frac{1}{2}{BIC}_l )}{\sum_{l=1}^L { exp(\frac{1}{2} BIC_{l}) }}$$` -- - BIC definition for GMM `$${BIC}_l = 2\log(\mathcal{L}) - \kappa_m \log(N)$$` -- - GMM likelihood `$$\mathcal{L}(\Theta) = \sum_{n=1}^N \sum_{k=1}^K{\pi_k\mathcal{N}(x_n|\mu_k, \Sigma_k)}$$` -- - Multivariate Gaussian density `$$\mathcal{N}(x|\mu, \Sigma) = \frac{\exp\left\{ -\frac{1}{2} (y-\mu)^T \Sigma^{-1}(y-\mu)\right\}}{|\Sigma|^{\frac{1}{2}} (2 \pi)^{\frac{D}{2}}}$$` --- background-image: url("data:image/png;base64,#imgs/separation_compactness.png") background-position: right background-size: 600px ### Aside: Cluster internal validation indices .pull-left[.content-box-blue[ - Often used as a proxy for model quality in clustering {{content}} ] ] -- - Choose between candidate models with different numbers of clusters `\(k\)` {{content}} -- - Interpreted similarly to marginal likelihood/model evidence {{content}} -- - Typically measure compactness and/or separation of clusters {{content}} -- Compared to BIC... - Agnostic to clustering algorithm - Typically do not require likelihood term --- background-image: url("data:image/png;base64,#imgs/separation_compactness.png") background-position: top right background-size: 300px ### New proposed weighting / approximation for posterior model probability BIC for GMM driven by **Multivariate Gaussian density:** `$$\mathcal{N}(x|\mu, \Sigma) = \frac{\exp\left\{ -\frac{1}{2} (y-\mu)^T \Sigma^{-1}(y-\mu)\right\}}{|\Sigma|^{\frac{1}{2}} (2 \pi)^{\frac{D}{2}}}$$` -- **Calinski-Harabasz Index** - Ratio of separation to compactness (minimise) `$$CH = \frac{\sum_i{n_i d^2(c_i,c)/(NC-1)}}{\sum_i{\sum_{x \in C_i}d^2(x,c_i)/(n-NC)}}$$` -- **S_Dbw Index** - (add detail here:) -- - CH and S_Dbw indices are both well validated in the context of --- ### New proposed weighting / approximation for posterior model probability CH and S_Dbw indices - conceptually and mathematically similar to BIC - Unlike BIC, can be calculated + directly compared across different clustering algorithms -- New proposed weight: <!-- $$P(Y \lvert \mathcal{M}_m) \approx \hat{\mathcal{W}}_m \coloneqq \begin{cases} --> <!-- \frac{\mathcal{W}_m}{\sum_{m'=1}^M \mathcal{W}_{m'}} & \text{if `\(\mathcal{W}_m\)` is to be maximised} \\ --> <!-- \\ --> <!-- \frac{\frac{1}{\mathcal{W}_m}}{\sum_{m'=1}^M \frac{1}{\mathcal{W}_{m'}}} & \text{if `\(\mathcal{W}_m\)` is to be minimised,} --> <!-- \end{cases}$$ --> .center[ <img src="data:image/png;base64,#imgs/cbma_eqn_wt.png" width="800" /> ] <br> -- Approximate posterior model probability for weighted averaging: `$$P(Y \lvert \mathcal{M}_m) \approx \hat{\mathcal{W}}_m = \frac{\mathcal{W}_m}{\sum_{m'=1}^M \mathcal{W}_{m'}}$$` --- ### Consistent quantity `\(\Delta\)` - Similarity matrices Previous work (Russell et al., 2015) has used pairwise similarity matrices as `\(\Delta\)` for each model -- - To get similarity matrix, multiply allocation matrix by its transpose: `$$S_m = A_m A_m^T$$` - **invariant to number and labelling of clusters across solutions** -- .center[ <img src="data:image/png;base64,#imgs/a_to_s.png" width="800" /> ] --- ### Consensus matrix .center[ <img src="data:image/png;base64,#imgs/eeg_consensus.jpg" width="600" /> ] `\begin{equation} C = {\sum_{m=1}^M\hat{\mathcal{W}}_m S_m}. \end{equation}` --- ### Consensus matrix ➡️ Matrix factorisation ➡️ Cluster allocation probabilities - Symmetric Simplex Matrix Factorisation (SSMF; Duan, 2020) to get `\(N \times K\)` allocation matrix `\(A_m\)` from `\(N \times N\)` consensus matrix `\(C\)` - Generates probabilistic cluster allocations from pairwise probabilities -- .center[ <img src="data:image/png;base64,#imgs/c_to_a.png" width="700" /> ] -- - Includes L2 regularisation step to reduce overfitting & redundant clusters --- class: center, middle ## Case study: Clustering adolescents based on resting state EEG recordings --- ## Model results .center[ <img src="data:image/png;base64,#imgs/clusters_scrnshot.jpg" width="1000" /> ] --- ## Model results ➡️ Similarity matrices .pull-left[**k-means** `\(\hat{\mathcal{W}}_m = 0.36\)` <br> <img src="data:image/png;base64,#imgs/km_similarity.png" width="300" /> ] .pull-right[**HC** `\(\hat{\mathcal{W}}_m = 0.27\)` <br> <img src="data:image/png;base64,#imgs/hc_similarity.png" width="300" /> ] .center[**GMM** `\(\hat{\mathcal{W}}_m = 0.37\)` <br> <img src="data:image/png;base64,#imgs/gmm_similarity.png" width="300" /> ] --- ## Model results ➡️ Similarity matrices ➡️ Consensus matrix <img src="data:image/png;base64,#imgs/eeg_consensus.jpg" width="800" /> --- ## BMA Clusters with allocation uncertainty .center[ <iframe src="imgs/eeg_3d_BMA_size_uncertainty_301122.html" width="750" height="400" scrolling="yes" seamless="seamless" frameBorder="0"> </iframe> ] - **Uncertainty can be propagated forward for further analysis in a Ba yesian framework** --- ### Benchmarking study: Feature comparison with other ensemble approaches <img src="imgs/cbma_feature_table.png" width="800" /> --- ### Benchmarking study: Accuracy comparison - 9 clustering algorithms - Simulated data with differing dimensions (2, 10, 50) and separation (high, medium, low) - 10 simulated datasets each combination of dimensions X separation -- <img src="imgs/benchmarking_results_table.png" width="700" /> --- class: center, middle ## A quick demo --- <iframe src="imgs/demo_recording_1080_tripspeed_changed.mp4" width="1000" height="650" scrolling="yes" seamless="seamless" frameBorder="0"> </iframe> --- background-image: url("data:image/png;base64,#imgs/bernie_meme.jpg") background-position: center background-size: contain --- ### Limitations & Future Directions - Comparison & sensitivity analysis of alternative internal validation metrics to approximate `\(P(\mathcal{M}_m \lvert Y)\)` -- - Fully Bayesian extension: * estimate probability distributions instead of point estimates for cluster allocation * incorporate Bayesian clustering solutions as inputs alongside frequentist clustering models --- background-image: url("data:image/png;base64,#imgs/watercolour_crop_nologo.jpg") class: middle, center # Chapter 5 # Functional analysis within latent states for neuroscientific data: Linking resting state brain activity, psychopathology and cognitive function in young people --- ### Chapter 5: Novel Contributions -- - Novel nested framework for functional data analysis (FDA) of functional time series data -- - Offers integrated analysis of functional latent states and temporal dynamics using FHMM, and functional characteristics stratified by latent states using FPCA -- - More nuanced insights into important characteristics of EEG variation relative to traditional analyses, less reliant on flawed assumptions and canonical features -- - Using HBN data, Bayesian regression models show that FHMM and FPCA outputs together are substantially associated with measures of psychopathology and cognitive function -- - Lays a foundation for more sophisticated phenotype identification including temporal and functional characteristics (Chapter 6) -- - Implemented in open source & freely available software in R for practitioners to apply across various fields -- - In preparation for submission to _Statistical Methods in Medical Research_ --- ### Functional Data Analysis - Benefits for neuroscience data - Capture influential characteristics across the whole functional domain of interest --- ### Functional Hidden Markov Models - screenshot from Martino - Similar to regular HMM, but emission function is not a PDF --> function based on L2 distance measure between each curve and the state's mean curve (centroid) --- background-image: url("data:image/png;base64,#imgs/eeg_fpca.jpg") background-position: bottom right background-size: 500px ### Functional Principal Component Analysis .content-box-red[ - Captures the essential patterns of variation in functional data in terms of scores on **eigenfunctions** (Ramsay et al., 2009) {{content}} ] -- - Capture information relating to a variety of summary features of interest with less difficulties around **high dimensionality** {{content}} -- - A few nice examples with EEG - Extract power components analogous to traditional frequency bands (Tenke & Keyser, 2005) - Bayesian fPCA in spatiotemporal domain for EEG artifact location and removal (Margaritella et al., 2020) -- --- ### Chapter 5: Methods - _flawless_: (**F**unctiona**L** **A**nalysis **W**ithin **L**at**E**nt **S**tate**S**) <img src="data:image/png;base64,#imgs/flawless_model_diagram_updated.png" width="600" /> --- ### FHMM - Latent State Centroids <img src="data:image/png;base64,#imgs/FHMM_centroids_230423.png" width="700" /> --- ### FHMM - Latent State Occupancy <img src="data:image/png;base64,#imgs/FHMM_Venn_100323_copy.png" width="400" /> Most common combinations: - State 1 only (25.6\%) - States 1 and 2 (21.5\%) - State 2 only (9.3\%) - States 3 and 4 (6.4\%) - State 3 only (6.0\%) - State 4 only (5.6\%). --- ### FHMM - Latent State Occupancy <img src="data:image/png;base64,#imgs/histograms_230423.png" width="200" /> .pull-right[ - Most visit 1 (n = 234; 46.5%) or 2 (n = 205; 40.8%) latent states - A majority (n = 479; 95.2%) make between 0 and 4 transitions between states over the recording period. ] --- ### Comparing health measures between individuals with dominant latent states <img src="data:image/png;base64,#imgs/FHMM_fingerprints_100323_copy.png" width="800" /> --- ### Bayesian regression for _flawless_ outputs and health measures -- - Bayesian regression models indicated substantial associations between: * Latent state temporal characteristics * **Functional frequency characteristics within states (FPCA scores)** * Health measures relating to psychopathology & cognitive function -- - Captures novel and nuanced information about influential characteristics and variation between individuals for resting state EEG, which are associated with important health outcomes --- background-image: url("data:image/png;base64,#imgs/watercolour_crop_nologo.jpg") class: middle, center # Chapter 6 # Data-driven phenotypes of resting state brain activity in adolescents: Cross-method, cross-sample replication and extension using functional analyses --- ### Chapter 6: Novel Contributions -- - Comprehensive replication and comparison of methods in Chapters 3 and 5, generating clusters for each using _clusterBMA_ from Chapter 4 -- - Cross-sample replication of phenotypes identified with method from Chapter 3, using HBN data -- - Cross-method replication and *extension* of phenotypes based on functional analysis outputs from Chapter 5 -- - Exciting findings - novel evidence of robust phenotypes for resting state EEG -- - Replication indicates more substantial evidence & lays foundation for clinical utility in the context of personalised healthcare and risk prediction -- - In preparation for submission to _Nature Neuroscience_ --- Pipeline 1 (Chapter 3): Time-averaged summary features - add step list -- Pipeline 2 (Chapter 5): _flawless_ analysis outputs - add step list -- 3 clustering algorithms, combined with _clusterBMA_ --- ### Refresher: Frequency content across clusters from Chapter 3 (LABS) <img src="data:image/png;base64,#imgs/p1_Fig5.png" width="400" /> --- ### Replication of Chapter 3 clusters in HBN data <img src="data:image/png;base64,#imgs/psds_p1_summary_080423.png" width="400" /> --- ### Comparison of time-averaged summary features between LABS and HBN clusters <img src="data:image/png;base64,#imgs/replication_radar_plots_220423.png" width="600" /> --- ### Health characteristics across Pipeline 1 clusters (time-averaged summary features) <img src="data:image/png;base64,#imgs/fingerprints_p1_pipeline_230423.png" width="600" /> --- ### Comparison of time-averaged clusters and FHMM Centroids <img src="data:image/png;base64,#imgs/FHMM_centroids_100323_copy.png" width="600" /> --- ### Clustering _flawless_ analysis outputs (Pipeline 2) - need to add table of cluster characteristics --- ### Health measures across phenotypes from _flawless_ outputs (Pipeline 2) <img src="data:image/png;base64,#imgs/fingerprints_p3_pipeline_230423.png" width="600" /> --- --- # Summary --- # Limitations --- # Future Directions --- # Acknowledgements --- ### References Aggarwal, C. C. and Reddy, C. K. (2014). Data clustering: Algorithms and applications. Chapman Hall/CRC Data mining and Knowledge Discovery Series. London. Babadi, B., & Brown, E. N. (2014). A review of multitaper spectral analysis. IEEE Transactions on Biomedical Engineering, 61(5), 1555-1564. Bernardo, J. M. and Smith, A. F. (2009). Bayesian theory, volume 405. John Wiley Sons. Bruckers, L., Molenberghs, G., Drinkenburg, P., & Geys, H. (2016). A clustering algorithm for multivariate longitudinal data. Journal of biopharmaceutical statistics, 26(4), 725-741. Burle, B., Spieser, L., Roger, C., Casini, L., Hasbroucq, T., and Vidal, F. (2015). Spatial and temporal resolutions of EEG: Is it really black and white? A scalp current density view. International Journal of Psychophysiology, 97(3):210-220. Cohen, M. X. (2014). Analyzing neural time series data: theory and practice. MIT press. Connell, A. M., & Frye, A. A. (2006). Growth mixture modelling in developmental psychology: Overview and demonstration of heterogeneity in developmental trajectories of adolescent antisocial behaviour. Infant and Child Development: An International Journal of Research and Practice, 15(6), 609-621. --- ### References de Lijster, J. M., van den Dries, M. A., van der Ende, J., Utens, E. M., Jaddoe, V. W., Dieleman, G. C., ... & Legerstee, J. S. (2019). Developmental trajectories of anxiety and depression symptoms from early to middle childhood: A population-based cohort study in the Netherlands. Journal of abnormal child psychology, 47(11), 1785-1798. Duffy, F. H., McAnulty, G. B., and Albert, M. S. (1996). Effects of age upon interhemispheric EEG coherence in normal adults. Neurobiol. Aging 17, 587– 599. doi: 10.1016/0197-4580(96)00007-3 Feng, J., Xu, H., & Yan, S. (2012). Robust PCA in high-dimension: A deterministic approach. arXiv preprint arXiv:1206.4628. Hartmann, J. A., Nelson, B., Spooner, R., Paul Amminger, G., Chanen, A., Davey, C. G., McHugh, M., Ratheesh, A., Treen, D., and Yuen, H. P. (2019). Broad clinical high-risk mental state (CHARMS): methodology of a cohort study validating criteria for pluripotent risk. Early Intervention in Psychiatry, 13(3):379-386. Jin, X., Liang, X., & Gong, G. (2020). Functional integration between the two brain hemispheres: evidence from the homotopic functional connectivity under resting state. Frontiers in Neuroscience. Kaiser, A. K., Gnjezda, M. T., Knasmüller, S., & Aichhorn, W. (2018). Electroencephalogram alpha asymmetry in patients with depressive disorders: current perspectives. Neuropsychiatric disease and treatment. --- ### References Keizer, A. W. (2019). Standardization and Personalized Medicine Using Quantitative EEG in Clinical Settings. Clinical EEG and neuroscience, page 1550059419874945. Ke, L., & Li, R. (2009, November). Classification of EEG signals by multi-scale filtering and PCA. In 2009 IEEE International Conference on Intelligent Computing and Intelligent Systems (Vol. 1, pp. 362-366). IEEE. Latzman, R. D. and DeYoung, C. G. (2020). Using empirically-derived dimensional phenotypes to accelerate clinical neuroscience: The Hierarchical Taxonomy of Psychopathology (HiTOP) framework. Neuropsychopharmacology, 45(7):1083-1085. Liedorp, M., Van Der Flier, W., Hoogervorst, E., Scheltens, P., and Stam, C. (2009). Associations between patterns of EEG abnormalities and diagnosis in a large memory clinic cohort. Dementia and geriatric cognitive disorders, 27(1):18-23. Loo, S. K., Lenartowicz, A., and Makeig, S. (2016). Research review: Use of EEG biomarkers in child psychiatry research{current state and future directions. Journal of Child Psychology and Psychiatry, 57(1):4-17. Margaritella, N., Inácio, V., & King, R. (2021). Parameter clustering in Bayesian functional principal component analysis of neuroscientific data. Statistics in Medicine, 40(1), 167-184. McGorry, P. D. and Mei, C. (2018). Early intervention in youth mental health: progress and future directions. Evidence-based mental health, 21(4):182-184. --- ### References McGorry, P. D., Purcell, R., Goldstone, S., and Amminger, G. P. (2011). Age of onset and timing of treatment for mental and substance use disorders: implications for preventive intervention strategies and models of care. Current opinion in psychiatry, 24(4):301-306. Newson, J. J., & Thiagarajan, T. C. (2019). EEG frequency bands in psychiatric disorders: a review of resting state studies. Frontiers in human neuroscience, 12, 521. Paus, T., Keshavan, M., and Giedd, J. N. (2008). Why do many psychiatric disorders emerge during adolescence? Nature reviews neuroscience, 9(12):947-957. Prerau, M. J., Brown, R. E., Bianchi, M. T., Ellenbogen, J. M., & Purdon, P. L. (2017). Sleep neurophysiological dynamics through the lens of multitaper spectral analysis. Physiology, 32(1), 60-92. Sawyer, M. G., Reece, C. E., Sawyer, A. C., Johnson, S. E., and Lawrence, D. (2018). Has the prevalence of child and adolescent mental disorders in Australia changed between 1998 and 2013 to 2014? Journal of the American Academy of Child Adolescent Psychiatry, 57(5):343-350. e5. Sun, S. and Zhou, J. (2014). A review of adaptive feature extraction and classification methods for EEG-based brain-computer interfaces. In 2014 International Joint Conference on Neural Networks (IJCNN), pages 1746-1753. IEEE. --- ### References Raballo, A. and Poletti, M. (2020). Advances in early identification of children and adolescents at risk for psychiatric illness. Current Opinion in Psychiatry, 33(6):611-617. Ramsay, J. O., Hooker, G., & Graves, S. (2009). Functional data analysis with R and MATLAB: Springer Science & Business Media. Tenke, C. E., & Kayser, J. (2005). Reference-free quantification of EEG spectra: combining current source density (CSD) and frequency principal components analysis (fPCA). Clinical Neurophysiology, 116(12), 2826-2846. Chicago Wu, W., Nagarajan, S., and Chen, Z. (2015). Bayesian Machine Learning: EEGMEG signal processing measurements. IEEE Signal Processing Magazine, 33(1):14-36. --- .pull-left[ #### *k*-means - 'Hard' clustering - Minimises within-cluster sums of squares] .pull-right[**k-means objective function** <br> `$$J = \sum_{i = 1}^K (\sum_k{\lvert \lvert x_k - c_i \rvert \rvert ^2})$$`] -- .pull-left[ #### Hierarchical Clustering (Ward's Method) - 'Hard' clustering - Each observation starts out in its own cluster - Repeated pairwise fusion of clusters that minimises change in within-cluster sums of squares (Ward)] .pull-right[**Ward's objective function** <br> `$$D(c_1,c_2) = \delta^2(c_1,c_2) = \frac{\lvert c_1 \rvert \lvert c_2 \rvert }{\lvert c_1 \rvert + \lvert c_2 \rvert} \lvert \lvert c_1 - c_2 \rvert \rvert ^2$$`] -- .pull-left[ #### Gaussian Mixture Model - 'Soft' clustering - Models data as coming from a mixture of Gaussian distributions - Restrictive distribution ('thin tails') - Tends to prioritise cluster compactness, may tend to 'overcluster'] .pull-right[**Mixture of multivariate Gaussians** <br> `$$p(x_n|\mu, \Sigma, \pi,K) = \sum_{k=1}^K{\pi_k\mathcal{N}(x_n|\mu_k, \Sigma_k)}$$`] ---